281 research outputs found

    Metaphorical Expressions in Automatic Arabic Sentiment Analysis

    Get PDF
    Over the recent years, Arabic language resources and NLP tools have been under rapid development. One of the important tasks for Arabic natural language processing is the sentiment analysis. While a significant improvement has been achieved in this research area, the existing computational models and tools still suffer from the lack of capability of dealing with Arabic metaphorical expressions. Metaphors have an important role in Arabic language due to its unique history and culture. Metaphors provide a linguistic mechanism for expressing ideas and notions that can be different from their surface form. Therefore, in order to efficiently identify true sentiment of Arabic language data, a computational model needs to be able to “read between lines”. In this paper, we examine the issue of metaphors in automatic Arabic sentiment analysis by carrying out an experiment, in which we observe the performance of a state-of-art Arabic sentiment tool on metaphors and analyse the result to gain a deeper insight into the issue. Our experiment evidently shows that metaphors have a significant impact on the performance of current Arabic sentiment tools, and hence it is an important task to develop Arabic language resources and computational models for Arabic metaphors

    Creating and validating multilingual semantic representations for six languages:expert versus non-expert crowds

    Get PDF
    Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a common multilingual semantic taxonomy for sets of words in six languages. 451 non-experts (including 427 Mechanical Turk workers) and 15 expert participants semantically annotated 250 words manually for Arabic, Chinese, English, Italian, Portuguese and Urdu lexicons. In order to avoid erroneous (spam) crowdsourced results, we used a novel taskspecific two-phase filtering process where users were asked to identify synonyms in the target language, and remove erroneous senses

    Creating and validating multilingual semantic representations for six languages:expert versus non-expert crowds

    Get PDF
    Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a common multilingual semantic taxonomy for sets of words in six languages. 451 non-experts (including 427 Mechanical Turk workers) and 15 expert participants semantically annotated 250 words manually for Arabic, Chinese, English, Italian, Portuguese and Urdu lexicons. In order to avoid erroneous (spam) crowdsourced results, we used a novel taskspecific two-phase filtering process where users were asked to identify synonyms in the target language, and remove erroneous senses

    Development of the multilingual semantic annotation system

    Get PDF
    This paper reports on our research to generate multilingual semantic lexical resources and develop multilingual semantic annotation software, which assigns each word in running text to a semantic category based on a lexical semantic classification scheme. Such tools have an important role in developing intelligent multilingual NLP, text mining and ICT systems. In this work, we aim to extend an existing English semantic annotation tool to cover a range of languages, namely Italian, Chinese and Brazilian Portuguese, by bootstrapping new semantic lexical resources via automatically translating existing English semantic lexicons into these languages. We used a set of bilingual dictionaries and word lists for this purpose. In our experiment, with minor manual improvement of the automatically generated semantic lexicons, the prototype tools based on the new lexicons achieved an average lexical coverage of 79.86% and an average annotation precision of 71.42% (if only precise annotations are considered) or 84.64% (if partially correct annotations are included) on the three languages. Our experiment demonstrates that it is feasible to rapidly develop prototype semantic annotation tools for new languages by automatically bootstrapping new semantic lexicons based on existing ones

    Feasibility of Emotions as Features for Suicide Ideation Detection in Social Media

    Get PDF
    Suicide-related social media message detection is an important issue. Such messages can reveal a warning sign of suicidal behaviour. This paper examines the efficacy of using emotions as sole features to detect suicide-related messages. We investigated two methods which use a single emotion and a set of seven emotions as features respectively. For emotion classification, we used a classifier based on BERT named "Emotion English DistilRoBERTa-base". For detecting suicide-related messages, we tested Naive Bayes and Support Vector Machine. As our training/test data for suicide message detection, we used a publicly available dataset collected from Reddit in which each post is labelled as "suicide" or "non-suicide". Our method obtained accuracies of 76.2% and 76.8% for detecting suicide-related messages with Naive Bayes and Support Vector Machine respectively. Our experiment also shows that three emotion categories, "anger", "fear" and "sadness", have a strongest correlation with suicide-related messages

    Building a Spanish lexicon for corpus analysis

    Get PDF
    This paper seeks to describe the creation of a Spanish lexicon with semantic annotation in order to analyse more extensive corpora in the Spanish language. The semantic resources most employed nowadays are WordNet, FrameNet, PDEV and USAS, but they have been used mainly for English language research. The creation of a large Spanish lexicon will permit a greater amount of studies of corpora in Spanish can be undertaken. In the description of the steps followed for the construction of the lexicon, the difficulties encountered in its creation, and the solutions used to overcome them will be described. Finally, the construction of the lexicon will allow specific research tasks to be carried out, such as metaphor analysis, ACD studies and even PLN studies

    Building a Spanish lexicon for corpus analysis

    Get PDF
    This paper seeks to describe the creation of a Spanish lexicon with semantic annotation in order to analyse more extensive corpora in the Spanish language. The semantic resources most employed nowadays are WordNet, FrameNet, PDEV and USAS, but they have been used mainly for English language research. The creation of a large Spanish lexicon will permit a greater amount of studies of corpora in Spanish can be undertaken. In the description of the steps followed for the construction of the lexicon, the difficulties encountered in its creation, and the solutions used to overcome them will be described. Finally, the construction of the lexicon will allow specific research tasks to be carried out, such as metaphor analysis, ACD studies and even PLN studies

    Contrastive Training with More Data

    Get PDF
    This paper proposes a new method of contrastive training over multiple data points, focusing on the scaling issue present when using in-batch negatives. Our approach compares transformer training with dual encoders versus training with multiple encoders. Our method can provide a feasible approach to improve loss modelling as encoders scale
    • …
    corecore